Classification-based spoken text selection for LVCSR language modeling

نویسندگان
چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Classification-based spoken text selection for LVCSR language modeling

Large vocabulary continuous speech recognition (LVCSR) has naturally been demanded for transcribing daily conversations, while developing spoken text data to train LVCSR is costly and time-consuming. In this paper, we propose a classification-based method to automatically select social media data for constructing a spoken-style language model in LVCSR. Three classification techniques, SVM, CRF,...

متن کامل

Tunable keyword-aware language modeling and context dependent fillers for LVCSR-based spoken keyword search

We explore the potential of using keyword-aware language modeling to extend the ability of trading higher false alarm rates in exchange for lower miss detection rates in LVCSRbased keyword search (KWS). A context-dependent keyword language modeling method is also proposed to further enhance the keyword-aware language modeling framework by reducing the number of false alarms often sacrificed in ...

متن کامل

Arabic Language Text Classification Using Dependency Syntax-Based Feature Selection

We study the performance of Arabic text classification combining various techniques: (a) tfidf vs. dependency syntax, for feature selection and weighting; (b) class association rules vs. support vector machines, for classification. The Arabic text is used in two forms: rootified and lightly stemmed. The results we obtain show that lightly stemmed text leads to better performance than rootified ...

متن کامل

Towards better language modeling for Thai LVCSR

One of the difficulties of Thai language modeling is the process of text corpus preparation. Because there is no explicit word boundary marker in written Thai text, word segmentation must be performed prior to training a language model. This paper presents two approaches to language model construction for Thai LVCSR based on pseudo-morpheme merging. The first approach merges pseudo-morphemes us...

متن کامل

Multi-channel sentence classification for spoken dialogue language modeling

In traditional language modeling word prediction is based on the local context (e.g. n-gram). In spoken dialog, language statistics are affected by the multidimensional structure of the human-machine interaction. In this paper we investigate the statistical dependencies of users’ responses with respect to the system’s and user’s channel. The system channel components are the prompts’ text, dial...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: EURASIP Journal on Audio, Speech, and Music Processing

سال: 2017

ISSN: 1687-4722

DOI: 10.1186/s13636-017-0121-5